Collection of Internet

home *** CD-ROM | disk | FTP | other *** search

/ Collection of Internet / Collection of Internet.iso / infosrvr / dev / www_talk.930 / 000553_fine@cis.ohio-state.edu _Tue Jan 12 23:36:39 1993.msg < prev next >

Wrap

Internet Message Format | 1994-01-24 | 3KB

Return-Path: <fine@cis.ohio-state.edu> Received: from dxmint.cern.ch by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0) id AA27044; Tue, 12 Jan 93 23:36:39 MET Received: by dxmint.cern.ch (5.65/DEC-Ultrix/4.3) id AA13287; Tue, 12 Jan 1993 23:51:47 +0100 Received: by soccer.cis.ohio-state.edu (5.61-kk/5.911008) id AA13895; Tue, 12 Jan 93 17:51:29 -0500 Date: Tue, 12 Jan 93 17:51:29 -0500 From: Thomas A. Fine <fine@cis.ohio-state.edu> Message-Id: <9301122251.AA13895@soccer.cis.ohio-state.edu> To: connolly@pixel.convex.com, @cis.ohio-state.edu@cis.ohio-state.edu Subject: Re: HTML todo list Cc: timbl@nxoc01.cern.ch, www-talk@nxoc01.cern.ch X-Mailer: Perl Mail System v1.1 >>I don't think we should do any shortref magic. The simplest thing >>(the way it works now) is that the two examples above are identical. >>I say this is fine. > >But it's a royal pain to implement! Doing full SGML newline processing >by the standard is quite involved (see the article by Eric Naggum >in comp.text.sgml about SGML and Records that I referenced in >an earlier message). For example, you can't just get rid of all >newlines immediately before or after tags, like it says in the >web: Only those right after a start tag (of a non-empty element), >right before an end tag, >or the ones on a line containing only comments and processing instructions. >Newlines around <P> tags, for example, _are_ reported. > >If we don't stick the SHORTREF magic in the DTD to force the >parser to report all newlines, we'll end up with countless hacks >at newline processing, none of which matches the standard, and >it'll be luck if any of them matches each other. Not necessarily. Carefully define which new-lines have to be ignored. This may yield something complex. But then, you are still free to ignore more new-lines than that in several different places, thus reducing the problem. In other words, it is up to the formatting program to decide how to interpret them. If it decides to throw out a few more new-lines at the beginning or end of various data elements, life becomes much easier. You might have countless hacks, but since formatters ar allowed to format things differently, does it matter? Take this example: Here's some text<P>And some more text <P> And some final text. SGML may say this: \nHere's some text (P )P And some more text\n (P )P \nAnd some final text. But the formatter is still free to toss those new-lines at the beginning and end of each paragraph (and in fact it had better if you don't want a space at the beginning of your paragraphs). tom